155 research outputs found

    Load Balancing for Mobility-on-Demand Systems

    Get PDF
    In this paper we develop methods for maximizing the throughput of a mobility-on-demand urban transportation system. We consider a finite group of shared vehicles, located at a set of stations. Users arrive at the stations, pick-up vehicles, and drive (or are driven) to their destination station where they drop-off the vehicle. When some origins and destinations are more popular than others, the system will inevitably become out of balance: Vehicles will build up at some stations, and become depleted at others. We propose a robotic solution to this rebalancing problem that involves empty robotic vehicles autonomously driving between stations. We develop a rebalancing policy that minimizes the number of vehicles performing rebalancing trips. To do this, we utilize a fluid model for the customers and vehicles in the system. The model takes the form of a set of nonlinear time-delay differential equations. We then show that the optimal rebalancing policy can be found as the solution to a linear program. By analyzing the dynamical system model, we show that every station reaches an equilibrium in which there are excess vehicles and no waiting customers.We use this solution to develop a real-time rebalancing policy which can operate in highly variable environments. We verify policy performance in a simulated mobility-on-demand environment with stochastic features found in real-world urban transportation networks

    High fidelity progressive reinforcement learning for agile maneuvering UAVs

    Get PDF
    In this work, we present a high fidelity model based progressive reinforcement learning method for control system design for an agile maneuvering UAV. Our work relies on a simulation-based training and testing environment for doing software-in-the-loop (SIL), hardware-in-the-loop (HIL) and integrated flight testing within photo-realistic virtual reality (VR) environment. Through progressive learning with the high fidelity agent and environment models, the guidance and control policies build agile maneuvering based on fundamental control laws. First, we provide insight on development of high fidelity mathematical models using frequency domain system identification. These models are later used to design reinforcement learning based adaptive flight control laws allowing the vehicle to be controlled over a wide range of operating conditions covering model changes on operating conditions such as payload, voltage and damage to actuators and electronic speed controllers (ESCs). We later design outer flight guidance and control laws. Our current work and progress is summarized in this work

    Actuator Constrained Trajectory Generation and Control for Variable-Pitch Quadrotors

    Get PDF
    Control and trajectory generation algorithms for a quadrotor helicopter with variable-pitch propellers are presented. The control law is not based on near-hover assumptions, allowing for large attitude deviations from hover. The trajectory generation algorithm ts a time-parametrized polynomial through any number of way points in R3, with a closed-form solution if the corresponding way point arrival times are known a priori. When time is not specifi ed, an algorithm for fi nding minimum-time paths subject to hardware actuator saturation limitations is presented. Attitude-specifi c constraints are easily embedded in the polynomial path formulation, allowing for aerobatic maneuvers to be performed using a single controller and trajectory generation algorithm. Experimental results on a variable pitch quadrotor demonstrate the control design and example trajectories.National Science Foundation (U.S.) (Graduate Research Fellowship under Grant No. 0645960

    Dynamic Bayesian Combination of Multiple Imperfect Classifiers

    Get PDF
    Classifier combination methods need to make best use of the outputs of multiple, imperfect classifiers to enable higher accuracy classifications. In many situations, such as when human decisions need to be combined, the base decisions can vary enormously in reliability. A Bayesian approach to such uncertain combination allows us to infer the differences in performance between individuals and to incorporate any available prior knowledge about their abilities when training data is sparse. In this paper we explore Bayesian classifier combination, using the computationally efficient framework of variational Bayesian inference. We apply the approach to real data from a large citizen science project, Galaxy Zoo Supernovae, and show that our method far outperforms other established approaches to imperfect decision combination. We go on to analyse the putative community structure of the decision makers, based on their inferred decision making strategies, and show that natural groupings are formed. Finally we present a dynamic Bayesian classifier combination approach and investigate the changes in base classifier performance over time.Comment: 35 pages, 12 figure

    Comparison of Fixed and Variable Pitch Actuators for Agile Quadrotors

    Get PDF
    This paper presents the design, analysis and experimental testing of a variable- pitch quadrotor. A custom in-lab built quadrotor with on-board attitude stabi- lization is developed and tested. An analysis of the dynamic di erences in thrust output between a xed-pitch and variable-pitch propeller is given and validated with simulation and experimental results. It is shown that variable-pitch actuation has signi cant advantages over the conventional xed-pitch con guration, includ- ing increased thrust rate of change, decreased control saturation, and the ability to quickly and e ciently reverse thrust. These advantages result in improved quadro-tor tracking of linear and angular acceleration command inputs in both simulation and hardware testing. The bene ts should enable more aggressive and aerobatic ying with the variable-pitch quadrotor than with standard xed-pitch actuation, while retaining much of the mechanical simplicity and robustness of the xed-pitch quadrotor.Aurora Flight Sciences Corp.National Science Foundation (U.S.) (Graduate Research Fellowship Grant 0645960

    Deep active learning for autonomous navigation.

    Get PDF
    Imitation learning refers to an agent's ability to mimic a desired behavior by learning from observations. A major challenge facing learning from demonstrations is to represent the demonstrations in a manner that is adequate for learning and efficient for real time decisions. Creating feature representations is especially challenging when extracted from high dimensional visual data. In this paper, we present a method for imitation learning from raw visual data. The proposed method is applied to a popular imitation learning domain that is relevant to a variety of real life applications; namely navigation. To create a training set, a teacher uses an optimal policy to perform a navigation task, and the actions taken are recorded along with visual footage from the first person perspective. Features are automatically extracted and used to learn a policy that mimics the teacher via a deep convolutional neural network. A trained agent can then predict an action to perform based on the scene it finds itself in. This method is generic, and the network is trained without knowledge of the task, targets or environment in which it is acting. Another common challenge in imitation learning is generalizing a policy over unseen situation in training data. To address this challenge, the learned policy is subsequently improved by employing active learning. While the agent is executing a task, it can query the teacher for the correct action to take in situations where it has low confidence. The active samples are added to the training set and used to update the initial policy. The proposed approach is demonstrated on 4 different tasks in a 3D simulated environment. The experiments show that an agent can effectively perform imitation learning from raw visual data for navigation tasks and that active learning can significantly improve the initial policy using a small number of samples. The simulated test bed facilitates reproduction of these results and comparison with other approaches

    Embodied imitation-enhanced reinforcement learning in multi-agent systems

    Get PDF
    Imitation is an example of social learning in which an individual observes and copies another's actions. This paper presents a new method for using imitation as a way of enhancing the learning speed of individual agents that employ a well-known reinforcement learning algorithm, namely Q-learning. Compared with other research that uses imitation with reinforcement learning, our method uses imitation of purely observed behaviours to enhance learning, with no internal state access or sharing of experiences between agents. The paper evaluates our imitation-enhanced reinforcement learning approach in both simulation and with real robots in continuous space. Both simulation and real robot experimental results show that the learning speed of the group is improved. © The Author(s) 2013

    The field high-amplitude SX Phe variable BL Cam: results from a multisite photometric campaign. II. Evidence of a binary - possibly triple - system

    Full text link
    Short-period high-amplitude pulsating stars of Population I (δ\delta Sct stars) and II (SX Phe variables) exist in the lower part of the classical (Cepheid) instability strip. Most of them have very simple pulsational behaviours, only one or two radial modes being excited. Nevertheless, BL Cam is a unique object among them, being an extreme metal-deficient field high-amplitude SX Phe variable with a large number of frequencies. Based on a frequency analysis, a pulsational interpretation was previously given. aims heading (mandatory) We attempt to interpret the long-term behaviour of the residuals that were not taken into account in the previous Observed-Calculated (O-C) short-term analyses. methods heading (mandatory) An investigation of the O-C times has been carried out, using a data set based on the previous published times of light maxima, largely enriched by those obtained during an intensive multisite photometric campaign of BL Cam lasting several months. results heading (mandatory) In addition to a positive (161 ±\pm 3) x 109^{-9} yr1^{-1} secular relative increase in the main pulsation period of BL Cam, we detected in the O-C data short- (144.2 d) and long-term (\sim 3400 d) variations, both incompatible with a scenario of stellar evolution. conclusions heading (mandatory) Interpreted as a light travel-time effect, the short-term O-C variation is indicative of a massive stellar component (0.46 to 1 M_{\sun}) with a short period orbit (144.2 d), within a distance of 0.7 AU from the primary. More observations are needed to confirm the long-term O-C variations: if they were also to be caused by a light travel-time effect, they could be interpreted in terms of a third component, in this case probably a brown dwarf star (\geq 0.03 \ M_{\sun}), orbiting in \sim 3400 d at a distance of 4.5 AU from the primary.Comment: 7 pages, 5 figures, accepted for publication in A&

    Deep imitation learning for 3D navigation tasks

    Get PDF
    Deep learning techniques have shown success in learning from raw high dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in imitation learning has been scarcely explored. Imitation learning can be an efficient method to teach intelligent agents by providing a set of demonstrations to learn from. However, generalizing to situations that are not represented in the demonstrations can be challenging, especially in 3D environments. In this paper, we propose a deep imitation learning method to learn navigation tasks from demonstrations in a 3D environment. The supervised policy is refined using active learning in order to generalize to unseen situations. This approach is compared to two popular deep reinforcement learning techniques: Deep-Q-networks (DQN) and Asynchronous actor critic (A3C). The proposed method as well as the reinforcement learning methods employ deep convolutional neural networks and learn directly from raw visual input. Methods for combining learning from demonstrations and experience are also investigated. This combination aims to join the generalization ability of learning by experience with the efficiency of learning by imitation. The proposed methods are evaluated on 4 navigation tasks in a 3D simulated environment. Navigation tasks are a typical problem that is relevant to many real applications. They pose the challenge of requiring demonstrations of long trajectories to reach the target and only providing delayed rewards (usually terminal) to the agent. The experiments show that the proposed method can successfully learn navigation tasks from raw visual input while learning from experience methods fail to learn an e�ective policy. Moreover, it is shown that active learning can significantly improve the performance of the initially learned policy using a small number of active samples

    Ground Delay Program Analytics with Behavioral Cloning and Inverse Reinforcement Learning

    Get PDF
    We used historical data to build two types of model that predict Ground Delay Program implementation decisions and also produce insights into how and why those decisions are made. More specifically, we built behavioral cloning and inverse reinforcement learning models that predict hourly Ground Delay Program implementation at Newark Liberty International and San Francisco International airports. Data available to the models include actual and scheduled air traffic metrics and observed and forecasted weather conditions. We found that the random forest behavioral cloning models we developed are substantially better at predicting hourly Ground Delay Program implementation for these airports than the inverse reinforcement learning models we developed. However, all of the models struggle to predict the initialization and cancellation of Ground Delay Programs. We also investigated the structure of the models in order to gain insights into Ground Delay Program implementation decision making. Notably, characteristics of both types of model suggest that GDP implementation decisions are more tactical than strategic: they are made primarily based on conditions now or conditions anticipated in only the next couple of hours
    corecore